During the COVID-19 pandemic, we recognized the challenge universities faced transitioning to online classes and exams, especially given the limitations of proctoring due to privacy regulations. Recognizing the limitations in preventing illicit collaboration during online exams, we decided to propose a data-driven method to identify potential collusion among students post-exam. We developed a method using an alternative distance measure and hierarchical clustering algorithms to pinpoint groups of students with remarkably similar exam results. Our method builds upon previous research that used exam event logs to detect collusion. Additionally, we present an approach to categorize groups as “outstandingly similar” using a proctored comparison group, with further details about our methodology and results provided in the subsequent sections of our paper.
The study utilized data from the Descriptive Statistics course at the University Duisburg-Essen, Germany, comparing a test group, which took an unproctored exam at home during the COVID-19 pandemic, to a comparison group that had a proctored exam in class prior to the pandemic. Both groups’ exams encompassed arithmetical problems, R programming tasks, and a short essay task. During the exams, students’ activities and time stamps were recorded in event logs, and the points achieved per task were noted. The dataset was cleaned to remove students with minimal participation or achievement and those who experienced internet issues, ensuring the comparability of both groups, despite the differing exam formats, as they shared similar content and learning goals.
The agglomerative (bottom-up) hierarchical clustering algorithm can be described by following equations:
\[D(x_i, x_{i'}) = \frac{1}{h} \sum_{j=1}^h w_j \cdot d_j(x_{ij}, x_{i'j})\]
\(D(x_i, x_{i'})\) is the global pairwise dissimilarity, while \(d_j(x_{ij}, x_{i'j})\) denotes the pairwise attribute dissimilarity. The weights \(w_j\) sum up to 1. Index \(i\) denotes the number of Students (\(N = 151\))\(i = 1, ..., N\) with \(N = 151\) students, while \(j\) is the index for each of the \(h\) attributes.
We compared two different kinds of attributes, namely the dissimilarities in the student´s event patters (time of submission) defined as \(d_j^L(v_{ij}, v_{i'j})\) and the dissimilarities in points achieved \(d_j^P(s_{ij}, s_{i'j})\).
\[D(s_i, s_{i'}, v_i, v_{i'}) = \frac{1}{h} \sum_{j=1}^h (w_j^P \cdot d_j^P (s_{ij}, s_{i'j}) + w_j^L \cdot d_j^L (v_{ij}, v_{i'j}))\]
is the combined model, where all weights \(w_j^P\) and \(w_j^L\) add up to one.
Figure 1: Dendogram produced by average linkage clustering of the proctored control group (2018/19). G-L mark the clusters with the lowest dissimilarity
Figure 1 displays the dendrogram of the proctored exam, which we used as a control group to establish a baseline for comparison. This dendrogram provides a visualization of what to expect without any collusion. Figure 2 depicts the test group, in which students were allowed to participate in the exam from home. The dendrogram of the test group reveals a lower overall level of dissimilarity compared to the control group. Notably, the six clusters with the lowest dissimilarity, labeled A-F, stand out significantly from the rest of the cohort.
Figure 2: Dendogram produced by average linkage clustering of the unproctored test group (2020/21). A-F mark the clusters with the lowest dissimilarity.
A comparison of the normalized distributions of the dissimilarities measures (figure 3) further shows, that 3 points from the test group are considerably distinct from the rest of the data points, which are marked by a red circle.
Figure 3: Comparison of the normalised distance measures.
Figure 4: Comparison of the event logs and achieved points of the clusters E and F from the test group (2020/21). Above the scatter plot, a bar chart is added to compare the points per subtask.
In our discussion, we interpret the results of hierarchical clustering algorithms which are visually represented through a dendrogram, a tree-like structure. After comparing various algorithms, we find average linkage clustering to be the most fitting for our analysis, helping us identify compact clusters, notably clusters A, B, and E, indicative of no larger group collusion. Additional visual tools like scatterplots and barcharts aid in examining student similarities within these clusters. The comparison with a reference group supports the effectiveness of our method in detecting collusion, but limitations exist due to the unknown ground truth. Despite this, our approach not only helps deter cheating in unproctored exams but also contributes to the broader digital transformation of education, preparing us for any unforeseen future challenges similar to the COVID-19 pandemic.
Future studies might focus on exploring the long-term effectiveness of the detection method in deterring students from colluding in exams, and its impact on academic integrity and student behavior. Also, the development and implementation of methods to collect and analyze complementary evidence, might be interesting for further research, with the aim of improving detection rates and understanding the extent of collusion among students.
Cleophas, C., C. Hoennige, F. Meisel, and P. Meyer (2021). “Who’s Cheating? Mining Patterns of Collusion from Text and Events in Online Exams”. In: Mining Patterns of Collusion from Text and Events in Online Exams (April 12, 2021).
Hellas, A., J. Leinonen, and P. Ihantola (2017). Plagiarism in Take-Home Exams: Help-Seeking, Collaboration, and Systematic Cheating. ITiCSE ’17. Bologna, Italy: Association for Computing Machinery, p. 238–243. ISBN: 9781450347044. DOI: 10.1145/3059009.3059065. https://doi.org/10.1145/3059009.3059065.
Hemming, A. (2010). “Online tests and exams: lower standards or improved learning?” In: The Law Teacher 44.3, pp. 283-308.
Hollister, K. K. and M. L. Berenson (2009). “Proctored versus unproctored online exams: Studying the impact of exam environment on student performance”. In: Decision Sciences Journal of Innovative Education 7.1, pp. 271-294.
Leinonen, J., K. Longi, A. Klami, A. Ahadi, and A. Vihavainen (2016). Typing patterns and authentication in practical programming exams , pp. 160-165.